Feedback topology and XOR-dynamics in Boolean networks with varying input 

structure 
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We analyse a model of fixed in-degree Random Boolean Networks in which the fraction of input- 
receiving nodes is controlled by the parameter 7. We investigate analytically and numerically the 
dynamics of graphs under a parallel XOR updating scheme. This scheme is interesting because it 
is accessible analytically and its phenomenology is at the same time under control, and as rich 
as the one of general Boolean networks. We give analytical formulas for the dynamics on general 
graphs, showing that with a xOR-type evolution rule, dynamic features are direct consequences of 
the topological feedback structure, in analogy with the role of relevant components in Kauffman 
networks. Considering graphs with fixed in-degree, we characterize analytically and numerically the 
feedback regions using graph decimation algorithms (Leaf Removal). With varying 7, this graph 
ensemble shows a phase transition that separates a tree-like graph region from one in which feedback 
components emerge. Networks near the transition point have feedback components made of disjoint 
loops, in which each node has exactly one incoming and one outgoing link. Using this fact we 
provide analytical estimates of the maximum period starting from topological considerations. 

PACS numbers: 89.75.Hc, 05.65.+b, 89.75.Fb 



I. INTRODUCTION 

Biological networks are graphs representingthe basic 
interactions between molecules in a living cell [11, H, H, 13] • 
They are generally composed of a fairly large number 
of elements and for this reason it is unfeasible to treat 
all the biochemical processes in detail. For example, a 
transcription network [1,0] of a simple cell counts 
some thousands of nodes, which represent the genes of a 
cell. Given the interaction structure, it is important to 
establish which genes are active in a given time or en- 
vironment, or under a given stimulus. In order to fulfill 
this task, it is necessary to develop coarse-grained mod- 
els for gene expression, such as discrete systems, in which 
variables on the nodes represent the (discretized) expres- 
sion of single genes, and directed connections stand for 
their interactions [8|, |9| . These abstract models are use- 
ful to study on general grounds the emergent cooperative 
behaviour of gene expression, as biological function is in- 
creasingly being recognized as emerging from global phe- 
nomena rather than from the expression of single genes 

The simplest model of this kind are Random Boolean 
Networks (RBNs) introduced by S. Kauffman in 1969 
In this model, N elements take binary values and 
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interact with some random coupling functions. In the 
standard Kauffman model, the configuration of an ele- 
ment is set by Boolean functions whose values depend on 
a fixed number k of inputs. The system is specified by its 
topology (a graph with k inputs regulating each node), 
a synchronous updating scheme and the choice for the 
ensemble of Boolean functions (for an introductory re- 
view see Refs. |H Q El). Technically speak ing, the 
behaviour of the system is fully characterized by its cy- 
cles (or fixed points) and their basins of attraction. If a 
cycle contains exponentially many (in N) configurations, 
the behaviour is called chaotic. Otherwise it is called 
ordered. According to the original Kauffman interpreta- 
tion, if the system is in a chaotic state, it cannot exhibit 
specific behaviour in response to external stimuli. More 
specifically, a realization of the model is interpreted as 
a genome and an attractor as a possible cell type (this 
particular interpretation is today considered out of date 
[la]). If the corresponding cycle period is too long, this 
hypothetical cell type would never realize it. For this rea- 
son, networks of biological interest should lie between the 
ordered and the chaotic phase (the so-called critical net- 
works [13]) j where attractor cycles are neither too short 
nor exponential. 

There are two problems concerning Kauffman's model 
in relation with genetic networks. First, the ensemble 
contains only graphs entirely made of feedback, while 
typically this is not the case of biological (e.g. transcrip- 
tion) networks, which usually have some "sensor" nodes 
that respond to external conditions [ill [lji HI . Thus, it 
is useful to modify the model and consider networks with 
well-defined input structure. The second problem is con- 
nected to the choice of the ensemble of Boolean functions. 
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While on biological grounds it is difficult to characterize 
this class, from a mathematical viewpoint the choice of 
functions strongly conditions the behaviour of the model. 
Indeed, since most functions are constant, or "canaliz- 
ing" [3, [l5| with respect to some variables, this creates 
an "effective topology" , which does not correspond to the 
underlying interaction graph. For this reason, the study 
of attractors in Kauffman networks is complex (for recent 
results see, for example, [2l|, [22|, HH). Also the study of 
critical networks with canalizing Boolean functions leads 
to the conclusion that this choice does not decrease at- 
tractor sizes and render the model more well-behaved as 
previously expected [2lj . 

In this work, we address these questions in a controlled 
way, with the following choices. First, we work with 
a graph ensemble with a parameter that regulates the 
fraction of input nodes, as done in p rvious studies that 
focused on fixed points 0, [H, l26l . H3] ■ Secondly, we 
choose to use the ensemble of XOR or totally non- canal- 
izing functions. In other words, we consider the situation 
in which all the existing links in the network have a role 
in the dynamics of real systems. The aim is to study the 
repercussions of the chosen topology on the dynamics 
and how this restricted selection of Boolean functions, in 
which the output changes if any one of the inputs changes 
its value, strictly relates these two aspects of the model. 
From the technical viewpoint, this dynamics has the ad- 
vantage to be approachable with linear algebra using the 
finite Galois field GF{2) [H HI, the set {0,1} where 
addition between elements is equivalent to the logic XOR 
and multiplication to the logic and. 

Our main results are the following. With varying 7, we 
observe a phase transition, similar to the one observed in 
Kauffman networks, from a region characterized by an 
ordered dynamic behaviour to a region in which the dy- 
namics is chaotic. From a topological point of view the 
same transition divides a region characterized by tree- like 
graphs from one in which extensive feedback is present. 
The structure of networks around the critical point ap- 
pears simplified, in the sense that the feedback regions 
of this kind of networks are organized in simple disjoint 
loops. We use the topological structure of critical point 
networks to estimate the maximum period times. The 
paper is organized as follows. After giving the basic def- 
initions in Section HH we present the main results in Sec- 
tion IIIIl In the last sections we discuss the results and 
the parallels with the scaling approach to general Kauff- 
man networks (30j . 



II. DEFINITION OF THE MODEL 

We consider networks of N nodes, of which only M 
receive input, and define 7 = M/N. Each of these regu- 
lated nodes receives inputs from exactly k randomly and 
independently chosen other nodes. Consequently the in- 
degree is k or 0, while the out-degree distribution is bino- 
mial, and in the "thermodynamic" limit N — > 00 and 7 



constant is well approximated by a Poisson distribution 
of mean kj, 



Pout(lk,n) 



= e~ 7fc 



( 7 fc) r 



As a consequence, the graphs contain nodes with no out- 
put {leaves) and N — M roots, i.e. variables without 
inputs (Fig. [I}. This ensemble is conventionally used in 
diluted spin-glass models and constraint-satisfaction net- 
works [33l [3J| . The standard Kauffman graph ensemble 
can be obtained considering the special case 7 = 1 in 
which all variables are regulated. The graph ensemble is 
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FIG. 1: A network with N = 9, in-degree k = 2 and 7 ~ 0.55. 
Red circles (colors online) indicate roots, blue circles leaves 
and green circles the inner nodes. 



specified byaMxJV connectivity matrix A where ay = 1 
if the edge j — > i exists, and zero otherwise. It is useful 
to arrange the columns of A in order to reserve the first 
M indices to the nodes that receive inputs. This matrix 
can further be divided into two submatrices A = (S \ R). 
S is the M x M matrix that describes the interaction be- 
tween regulated elements, while the columns of R contain 
the information on the outputs of the free variables. 

We consider a dynamics on the graphs of this ensemble, 
specified by assigning Boolean variables to each of the N 
nodes and interactions through XOR coupling functions. 
A global state a is defined as the set of configurations 
assumed by all the nodes of the network. The initial 
conditions determine the state of the root nodes since 
they do not have input and their values remain fixed 
during the evolution. Root nodes are considered as ex- 
ternal inputs and for this reason we can think that the 
initial conditions represent "the external world" . The 
configuration space, formed by all global states, contains 
2 N states. Since the system is finite, starting from some 
initial global state, the deterministic dynamics leads to 
periodically repeated states, possibly after a transient 
time. In other words, the system performs a trajectory 
in the state space and eventually arrives to an attractor 
of length T, where the states are periodic in time. The 
attractor has length T if a(t+T) = a(t). We call this a T- 
cycle or a fixed point, in case of unitary length. The basin 
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of attraction of an attractor is the set of global states 
that reach the attractor, including the attractor states. 
A transient state is a state that belongs to a basin but 
is not part of an attractor. The TV-dimensional Boolean 
vector a it) can be written as a it) — ix{t) , y(t)). The M- 
dimensional vector a?(t) denotes the state of the regulated 
variables and the (N — M)-dimensional vector y(t) the 
(constant) state of the root variables (j/(t) = y(0) = y). 

The synchronous update at time t is determined by 
random XOR functions, or, equivalently, by the linear op- 
eration in the Galois field GF(2) [H El 



x(t + 1) = AS it) + b = Sx(t) + Ry + b, 



(1) 



where b is a random M-dimensional Boolean vector con- 
taining the information relative to the nature of regulat- 
ing functions. The components 6j of b are or 1 with 
probability h. The evolved state after t steps is 



i-l 

xit) = S t £(0) + ^S i (i?j7+&) = S' t f(0) + S(t)(i?y + &), 



(2) 

where £(t) = J2i=o <S" (the symbol = stands for a defini- 
tion). One can observe from Eq. ^ that the dynamics 
is controlled by the topology through the interaction ma- 
trix. The peculiarity of the XOR dynamics is that every 
change of one (or an odd number of) input in a function 
determines a change in the output. We shall demonstrate 
that feedback components of the graph are the only rel- 
evant region needed for characterizing the dynamic be- 
haviour of the whole network. 

The model introduced here bears some similarities and 
some differences with the Kauffman model. One differ- 
ence is that the graph ensemble is not the same, as only 
a fraction of nodes receives input. However, as we will 
see, root nodes and feedback regions play the same role 
of nodes with a constant update function and so-called 
"relevant" nodes [U [HJ respectively, leading to an inter- 
esting parallel with more general Boolean networks [30j . 
In the standard Kauffman model, the existence of frozen 
nodes (i.e. nodes under constant update functions, where 
the variables assume the same value on every attractor) 
denotes that some links are irrelevant to the dynamics 
and effectively modifies the topology in a way that is, in 
general, difficult to control. By contrast, the simplified 
model presented here enables to separate the discussion 
and the study of topology and dynamics, the features 
of the dynamics being a repercussion of the topology of 
the network. Note however that in our case the value of 
root nodes can change with the initial conditions in con- 
trast with what happens to frozen nodes in Kauffman 
networks. 



III. RESULTS 
A. General features of XOR-dynamics 

The discrete XOR dynamics defined above allows to 
derive some simple general properties that will be used 
in the following. 

Firstly, linearity implies that the cycles have a least 
common multiple (lcm) structure, where longer nontriv- 
ial cycles can be constructed by combining smaller ones. 
As a consequence, if a network shows a set {Tj} of cycle 
lengths, then a cycle of length T' — lcm{Ti} also exists. 

From Eq. © a global state a it) belongs to a T-cycle 
if ct(T + t) = <?(t), which gives the condition 



(1 + S)x(0) + Ry + b ekerE(T), 



i.e. the vector on the left hand side is sent to by the 
function X for some T. In the eventuality of T = 1 the 
same condition becomes 

(1 + £)£(()) =Ry + b. 

The fact that the dynamics can only have transients 
or cycles translates into the formula 



S 



l+tm — gl 



(3) 



where I is the smallest integer such as ker S l — ker S l+1 
(the length of the longest transient) and t m is the max- 
imum cycle length (without repetitions) for the map 
x — ► Sx. Indeed, after a transient period of at most I 
steps, the dynamics becomes cyclic, i.e. identical con- 
figurations occur each t m steps (and S tm is the iden- 
tity matrix). This quantity can be related to the maxi- 
mum cycle for the complete dynamics using the fact that 
£(2£ m + Z) = £(/), so that for any given initial condition, 

f (2t m + 1) = S l x + E(0( Ry + b) = xil) , 

which implies that the maximum cycle for the full dy- 
namics is at most T max — 2t m . 

We now give a heuristic argument connecting the al- 
gebraic properties of S with statistical properties of the 
dynamics. If m(T) is the multiplicity of T-cycles, and 
i3(T) is the number of configurations in the basin of at- 
traction of each T-cycle, one must have 



2 w = ^£(T)to(T) 



The elements belonging to a basin of attraction are built 
adding z 6 ker S to a fixed system state x of a T-cycle 
(Note that also elements z = are taken in considera- 
tion and thus x itself is considered part of the basin). 
Thus, each attractor of length T has a basin of attrac- 
tion whose dimension is BiT) = TB(l) = T2 d , where 
d = dim(kerS f/ ). To compute the probability of a cycle 
one has to provide an estimate for m(T). From previous 
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work [24j | , we know that generally the average number of 
fixed points is m(l) = 2 N ~ M , this can be also the typical 
value, depending on the graph ensemble and the value of 
7. Since adding a fixed point to the elements of a T-cycle 
leads to a cycle of the same length, m(T) has a prefac- 
tor m(l). Supposing that the contribution of the longer 
cycles is of order one, one can write, m(T) ~ m(l), or in 
other words the multiplicity of cycles can be estimated 
using the number of fixed points. In this hypothesis, the 
probability p(T) that a state is in the basin of attraction 
of a T-cycle is 



p(T) = ±B(T)m(T) 



-yd- Mr 



(4) 



Note that this estimate holds only for the values of T that 
are possible. If T max grows faster than M, ^ T T ~ T max 
implying that P(T) will be concentrated on T max . Equa- 
tion has a practical implication for simulations: given 
a realization of a network, it enables to simulate only few 
initial conditions to find the value of T max that is reached 
with highest probability. These facts will help under- 
standing some dynamic features of the networks around 
the critical line. 
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B. Simulations 

We will now turn our attention to direct simulations of 
the model presented in Section [TTJ In these simulations 
one has to sample a number of ensemble graphs, and 
for each graph, a number of random initial conditions. 
The choice of the average performed and the quantity of 
interest leads to the definition of different observables. 

The simplest observables relate to the fixed points of 
the dynamics. Figure [2] a shows the probability of finding 
at least one fixed point as a function of the parameter 7. 
Fixed points are typically exhibited for low values of 7 
while they become rare for higher values, with a sharp 
threshold in between. One can notice that increasing 
system size makes more evident the two distinct regimes. 
Another way of presenting this result is given in Fig. Ob, 
which shows the variance of p(l) depending on 7. This 
plot has a peak whose maximum value shifts with increas- 
ing number of nodes. Accurate location of the transition 
point in Figs. Oa and^b is rendered difficult by the fact 
that for larger system sizes, where the results should be 
more reliable, sampling efficiently the initial conditions 
becomes difficult. 

These results show signatures of a dynamical transition 
point 7^? which marks the border between a region with 
typically fixed points and a region where cycles dominate 
on fixed points (for recent studies on the critical transi- 
tion between chaotic and ordered phase in Kauffman net- 
works see, e.g., 35]). According to Eq. [U for T = 1, the 
shape of the curves indicates that d becomes significantly 
different from M after a certain value of 7. In the large 



FIG. 2: Simulations of dynamics: Fixed points, (a) 

Probability of finding at least one fixed point for different val- 
ues of N (k — 3). The probability drops down after a certain 
value of 7. (b) Variance of the probability of observing one 
fixed point as a function of 7. The plot is obtained by averag- 
ing the fraction of 10 3 random initial configurations reaching 
a fixed point (on a fixed graph) over 10 3 graph realizations at 
a giver 
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FIG. 3: Simulations of dynamics: Long cycles. The plot 
presents the fraction of periods larger than the cutoff (10 6 ). 
At N and 7 given, we sampled 10 3 random networks and for 
each network we evaluated 10 initial conditions. 



system limit, if d <C M one expects that p(l) is negligi- 
ble and the probability of finding at least one fixed point 
drops to zero. At the transition point the probability of 
a fixed point and its variance should drop to zero in the 
thermodynamic limit. The effects of finite size are evi- 
dent as the measured transition point strongly depends 
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on the system dimension. We will investigate these ef- 
fects later (Section III! C[) . Note that the fraction of fixed 
points just below the threshold decreases unexpectedly 
with N. However, this has no physical meaning and is 
connected to the fact that the number of initial condi- 
tions is kept constant and consequently larger systems 
are increasingly under-sampled. 

The same transition is visible by quantifying the length 
of cycles. However this task is computationally hard al- 
ready for N ~ 100, which strongly limits this kind of 
simulations. Indeed, to implement these simulations, it 
is necessary to impose a cutoff on the length of the maxi- 
mum period observed. Figure [3] shows the fraction of net- 
works with a period larger than the cutoff imposed, given 
N and 7. This quantity grows rapidly beyond a charac- 
teristic parameter value that changes even more with the 
system dimension, and the transition point is difficult to 
locate precisely. Hence, it is necessary to find effective 
methods to investigate the dynamics and this problem 
will be discussed in Section IIIIEI after having studied 
the feedback topology of graphs in correspondence with 
the transition. 



C. Feedback topology and Leaf Removal 
algorithms 

In this section, we analyze the topology of the graphs. 
The question that we want to address is whether the dy- 
namic transition is connected to a change in the feedback 
topology of the underlying graphs. 

To study the feedback regions we use three modified 
versions for directed grap hs of the Leaf Removal (LR) 
decimation algorithm [34 [3(| . A graph decimation tech- 
nique consists in erasing iteratively nodes and links with 
some specified prescriptions. The variants we imple- 
mented remove tree-like parts of the graph, leaving the 
components with feedback. Network regions not removed 
by the algorithms are called feedback core of the graph or 
simply core. It is possible to study analytically the be- 
haviour of the algorithms in the mean-field approxima- 
tion [3 71 ] , considering for example the fraction of nodes in 
the core. The LR algorithms define a dynamics for the 
probability p n (t) that a regulated node has n inputs at 
iteration t, (i.e. of having n ones on a row of the matrix 
5 1 ), and for the probability f n (t) that a node has n out- 
puts at iteration t (n ones on a column of S). We study 
LR algorithms analytically with mean-field equations and 
compare with numerical simulations. As we will discuss, 
this approach is related to the scaling approach [3C| to 
general Kauffman networks, where one can write and an- 
alyze scaling equations for the numbers and fluctuations 
of nodes receiving input from a given number of frozen 
nodes. 

The first LR variant we analyze is the so called Leaf 
Removal Up (LRu) [27]. In each step of this algorithm 
the variables without outgoing links (the leaves) are re- 
moved together with its incoming links. The procedure is 




(c) 



FIG. 4: Leaf Removal algorithms, (a) Iterations of LRu 
for the graph presented in Fig. [T] . The LRu removes itera- 
tively the non regulating variables and their links (light grey 
nodes and dotted arrows). In this case the algorithm stops 
after two steps, (b) Iterations of LRd. The LRd removes iter- 
atively the regulating variables that are not regulated and the 
associated links (light grey nodes and dotted arrows). Only 
one step is possible in this case, (c) Iterations of LRb. First 
the LRu is applied and then all non-regulating nodes are taken 
off with the LRd. 

iterated until each variable has at least one outgoing edge 
(Fig. HJa). As a consequence of the removal of links, the 
distribution /„ of the outgoing links changes after each 
iteration t, giving rise to fluxes that can be expressed by 
first-order mean-field equations for f n {t) (see Ref. (271 ] 
and Appendix [A"|l . The solution of the mean-field flux 
equations is: 

f (i) = -i+e~^ 
/„(t) = e -A(f)M£, n>0 , 

where t — t/M and A(i) = 7/0(1 — t ). The algorithm 
stops at the reduced time i* p and the number M" p of 
remaining nodes is 

Mr = Mz* up , 

where z* p = 1 — i* p is a function of 7 and represents the 
fraction of nodes belonging to the LRu core on the total of 
regulated nodes. Figure a compares the analytic curve 
for z* p as a function of 7 with numerical results. Varying 
7, there is a singularity which is a signature of a phase 
transition between a region characterized by graphs en- 
tirely removed by the algorithm (z* = 0) and a region 
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in which the amount of remaining nodes after the pro- 
cess augments when 7 is increased. These regimes are 
separated by the "critical point" 7" p = fc _1 found with 
mean field equations. 

The second variant we analyze is the Leaf Removal 
down algorithm (LRd) (38j, which, in a similar way as 
above, removes iteratively the nodes with no incoming 
links, together with their outgoing edges, until there are 
no more roots (see Fig. H]b). The mean- field flux equa- 
tions (shown in Appendix [X| for the distribution of in- 
coming links p n (t) have solution 



LRu 



LRd 



r p (z) = (z-l) + (l-jz) k 
k " 



Pn{z) 



( 7 z)"(l - jzf- 



n>0 



where z = 1 — t. When the algorithm halts the amount 
of nodes in the LRd core is given by 



= Mz 



down 



The critical value jdown — divides the networks with 
a non-vanishing core in the large N limit from the ones 
that are removed by LRd. The plot of z* lown (~f) is pre- 
sented in Fig. 0b and compared with simulations. 

The last version of LR we consider is the combina- 
tion of LRu and LRd which we call Leaf Removal both 
(LRb). From the point of view of the dynamics this al- 
gorithm first removes all the nodes which do not regu- 
late the core and successively it removes the set of nodes 
that are fixed, at most after a transient time, the ini- 
tial conditions given. From the topological viewpoint it 
leaves all and only the nodes involved in feedback cycles 
(Fig. HJc). It can be checked that a single iteration of 
LRu and LRd is sufficient for this. The LRu algorithm 
stops at a matrix with M" p rows having k entries and 
whose remaining rows are empty. When LRd is applied 
to this matrix, elimination of all empty rows and columns 
leaves with M b c ° th = M^z* own = Mz* down z* p nodes, and 
the following per-row and per-column probabilities of n 
entries 



Pa = fo = 

Vn = i z down) 



\t Z doww) ( 1 - l Z *down) 



k—n 



f = (z* )-i e -MC P ) A(? V" 

in \^up! c n< 



(5) 

After the application of the algorithm, the distribution 
of both the outgoing and the incoming links has changed 
with respect to the initial network. In particular, no 
nodes without incoming/outgoing links may remain. In 
this case z* both = z* down z* up and M c = M b c ° th = Mz* both 
The core becomes extensive (order N) above the critical 
value -i b c oth = k~ x (Fig. HJc). 

The analytical approach described so far evaluates the 
mean value of the number of nodes in the core, in the 
thermodynamic limit TV — > 00. In order to access the 
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FIG. 5: Phase transition for LR algorithms (k — 3). The 

dashed lines represent the fraction of nodes removed by LR 
up (a), LR down (b) and LR both (c) calculated for different 
values of 7. Squares are numerical results for graphs with 
N — 10 4 nodes and blue triangles in (c) are the simulations for 
iV = 5- 10 4 . Each point is obtained averaging 10 4 simulations. 
Error bars are smaller than the dimension of the points. The 
discontinuity of the derivative points to the emergence of an 
extensive feedback core at 7 = | . (d) Comparison of analytic 
curves for the three variants of the algorithm. 



fluctuations around this value we use direct simulation. 
The shape of the distributions changes significantly with 
varying 7 and number of elements. Figure [Bja shows the 
distributions of M c with varying 7. Below the critical 
value, M c is condensed around zero, i.e. the graphs are 
typically close to being tree-like. Near the transition, the 
distributions have broad tails, and for 7 S> 7c they be- 
come symmetric around the value found with mean- field 
LR equations. This phenomenology is typical of phase 
transitions. In brief, the results show the presence of a 
phase transition, with varying the order parameter 7, be- 
tween a region of typically tree-like graphs and a region 
of extensive feedback loops. These conclusions hold inde- 
pendently from the in-degree value k. Figure [6lc shows 
the consequence of finite sizes on the LRb algorithm on 
the transition: the peak of the variance plot indicating 
the transition point is shifted to greater values and, in- 
creasing the system dimensions, it slowly approaches the 
value /c _1 . Even for systems with more than 10 nodes, 
the critical point does not reach the analytically deter- 
mined value. We call j e the maximum of the variance of 
M c i.e. the effective critical value of 7 at finite system 
size ( 7e — > 7fc,N — > 00). In the region j c < 7 < 7e , 
a residual LRb core is observed and, as we will see, this 
affects the dynamics. Thus, for small networks, j e does 
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not distinctly separate the region characterized by tree- 
like graphs and the one defined by feedback regions which 
are present also before this value. This can be also ob- 
served in Fig. [S]b which presents the scaling of M c /N. At 
a value of 7 = 0.35, M c may seem to be a sub-extensive 
quantity. However, increasing N it appears clear that 
the core is extensive, as the LR equations predict. Tak- 
ing simulations of very large systems (up to 5 TO 5 nodes), 
values of the fraction of remaining nodes in the core are 
comparable with which ones obtained from the analytic 
calculation. 



D. Feedback structure at the transition 



Having one entry per column and per row, core matri- 
ces correspond to graphs in which all nodes have (typi- 
cally) one input and one output, and they show a struc- 
ture with simple loops disconnected from each other 
(Figs. [71b and[jjc). Simulations suggest the presence of 
several disconnected components that increase in num- 
ber when the dimension N of the system increases. With 
growing N, the number of tree-like graphs, as well as 
the number of cores with only one connected component, 
decreases (Fig. [6jd) . The organization in disconnected 
loops does not depend on the in-connectivity degree k. 
The number of nodes in the core of critical networks 
scales as A^ with £ ~ 0.4 (Appendix IB]) . 



Let us now take a closer look at what happens around 
the critical point of LR algorithms. Figure [5]d shows 
that the three variants of LR have a different trend just 
after the transition value. Defining e = 7 — j c , for small 
e > the rescaled times of arrest z (i.e. the solutions 



of the equations fo{z* p ) 
Appendix |Aj become 



= and po{z* a 



= 0, see 



""down 



2fce(l - 2ke) ~ 2ke 

2ek 2 ,„ j . 2ek 2 
-(1 - 2ek) 



k - 1 



k - 1 



and for LRb: 



^both ^up^down 



4/c 3 



4fc 3 



k-l k-1 



(7 - 7c) 2 



These functions arc continuous at the transition but not 
all their derivatives are. The discontinuity in z^ oth is of 
greater order than the others, signature of transitions of 
different orders. 

We will now consider the topology of networks with 
7 ~ 7 C (critical networks). Starting from Eqns. ([5]) we 
can write 



~2ke 



l-2ke + 0(e 2 ) 



Pi * (1 



2ke 
k-l 



concluding that fy- 
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e. This means that, near 



eand^ 

pi 

to the transition point and at the thermodynamic limit, 
core matrices typically have one entry only per-column 
and per-row. In other words, they are permutations of 
the identity matrix (and consequently they are invert- 
ible). Thus, we can think of the residual (LRb) core as a 
particular Kauffman network with in-degree one (lEI . l3ll ] . 
There is, however, an important difference. In Kauffman 
networks with in-connectivity one, a node can have more 
than one outgoing edge, while in this case all nodes in 
the core must regulate exactly one element; there are no 
nodes without output (otherwise they would have been 
removed by the algorithm). 



E. Connecting feedback topology and dynamics: 
reduced dynamics 

Since topology and XOR dynamics are strictly related, 
one would expect that the feedback topology transitions 
have a repercussion on the evolution of states. We first 
observe that, around the transition point, the matrices 
are invertible, hence d = (Section IIII Ap . From this, 
the probability of having at least one fixed points falls 
down at the transition. On the other hand, before the 
transition one expects to have only fixed points, because 
of the tree-like structure of graphs. We can rearrange the 
indices of the matrix S and place leaf variables (erased 
by LRu) in the first indices and roots (erased by LRd) in 
the lasts. In this way it takes the form [27| : 





( 


c 


s = 





c 




V 


1 n 



Nodes belonging to the LRb core are collected in the sub- 
matrix C which has Mz£ oth rows. Applying S to a vector 

of the form x(t) = (f(i) , c{t) , f(t)) (see Eq. © ) one 
concludes that the evolved state of roots, r(t), is fixed 
only by initial conditions and remains constant. The 
nodes c belonging to the LRb core are determined by 
both r and c itself. Finally, the configuration of leaf 
nodes I is established by all the variables and is not af- 
fected by feedback. Thus, the behaviour of the whole 
dynamics can be deduced from the state of the core. For 
these reasons, simulations of the dynamics restricted to 
the core (reduced dynamics) are sufficient to evince the 
salient features of the dynamics. Using this fact, we are 
able to simulate networks with up to 10 3 nodes with a 
large statistic. On the other hand, obtaining data in the 
chaotic region is very difficult also using the reduced dy- 
namics (see Fig. [5]) . 

Nevertheless, the region immediately after 7 C is inter- 
esting, because it carries the consequences of the simpli- 
fied core structure in the topology. The equations for 
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FIG. 6: Feedback core, (a) Three-dimensional plot showing the histogram of the LRb-core size Mc for different values of 
gamma (for k — 3, N — 10 4 , and 10 4 different networks, colors online), (b) Fraction of nodes in the LRb core. Points represent 
the averaged values resulting from simulations (ensembles composed by 10 4 networks or 10 3 for large systems) for different 
values of 7: 0.20 (red online), 0.35 (blue online) and 0.45 (green online), (c) Variance of the fraction of the number of nodes 
belonging to the core, (d) Histogram of the number n of connected components of the core at varying N. For fixed values of 
N the number of connected parts does not remarkably change remaining around the critical value of 7 (right). 10 4 iterations 
for each value of N are simulated. 



critical core variables are special cases of Eq. ([TJ when 
each node i receives one input from node 

Ci(t + 1) = Cj(i)(t) + h , 

In case the core is a single component, then the maxi- 
mum period length T rnax will be approximately equal to 
M c . Since, near the transition, core matrices are invert- 
ible Eq. (J3J) becomes S* m = 1 (I = 0). The condition to 
have the maximum period is 

c{T max ) = Sj™*c(0) + E(T max )b = c(0) . 

The matrices S c are also permutation matrices with di- 
mension M c x M c and thus S™ c = 1. Since £(2M C ) = 0, 



the maximum period achievable is T max — 2M C ~ M c . 
In case the core is formed by several disconnected chains, 
the maximum period of the whole network can be com- 
puted as the least common multiple of the period of each 
single loop (Appendix [Cj . 



IV. DISCUSSION AND CONCLUSIONS 

In this work, we considered a model of Random 
Boolean Networks related to Kauffman networks. The 
model has the objective to investigate on abstract 
grounds some features that are important in biological 
networks, mainly (1) the presence of nodes which regulate 
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(b) (c) 



FIG. 7: Structure of a critical core, (a) shows a realization of a simulated network with N = 10 3 and M = 333 (potential 
isolated nodes are not depicted). It can be noticed that each regulated node receives exactly k = 3 inputs. The structure is 
highly simplified after the application of LRb: only three disjoint simple loops are present (b). Panel (c) reports another typical 
chain structure of a core made of a single connected component (the starting graph had N = 100 and M = 33). 



but are not regulated, existing in empirical transcription 
networks, and (2) the correspondence between topology 
and dynamics. For the latter reason we chose to use a 
XOR dynamics to define the interaction between elements. 
We have shown that in this model the dynamical as- 
pects characterizing the length of cycles and their basins 
of attraction are direct consequences of the topological 
feedback structure of the underlying networks, which we 
study with graph decimation algorithms (LRs) on a fixed 
in-degree ensemble of graphs. 

We identified a dynamical and topological phase tran- 
sition with varying input structure of the graphs mod- 
ulated by the parameter 7 (the fraction of the input- 
receiving nodes) . Direct numerical simulations of the dy- 



namics (Figs. [2] and [3]) suggest the presence of a phase 
transition where long cycles emerge. Correspondingly, a 
topological phase transition separates a tree-like graph 
region from one in which extensive feedback components 
are present. Different Leaf Removal algorithms that re- 
move the upstream tree-like regions, the downstream 
ones, or both, have the same critical point j c = k^ 1 , 
which we locate analytically and numerically. The dy- 
namic transition is found numerically at a critical value 
that is close to 7 C , but strongly influenced by finite size 
effects. Since networks below the transition typically ex- 
hibit fixed points, and this behaviour is correlated with 
the tree-like structure of these graphs, we identify the 
dynamic transition point with 7 C in the thermodynamic 
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Tmax > cutoff 




1e+05 1e+06 1e+07 1e+08 1e+09 
cutoff 



FIG. 8: Fraction / of networks which have a T max greater 
than a cutoff. Simulations made for networks with N = 999 
and k = 3. From 7 = 0.33 to 7 = 0.38 we respectively 
simulated 1000, 589, 284, 149, 95, 46 different realizations of 
the model. 



limit. Thus, despite of the large finite-size effects which 
make the simulations difficult, our results lead to the con- 
clusion that the topological and dynamic transitions are 
the same. 

This result is only valid for the ensemble of func- 
tions under consideration. Dynamics with a more general 
class of functions should present a different critical point. 
However, the value of the nodes removed by LRd is fixed 
after a transient time, independently from the dynam- 
ics chosen, because the fact that nodes receiving inputs 
from the topological core depend just on the the state 
of the core is not conditioned to the class of functions. 
The relevant dynamical part of the network must be a 
subset of this the core, and a different choice of update 
rules moves the dynamical transition away from the j c . 
It is simple to imagine that with a more general class of 
functions one can still observe fixed points for values of 
7 > 7 C . Thus, the topological critical point represents a 
lower bound for the dynamical critical point. 

Above the transition (Fig. [3]) the presence of an exten- 
sive feedback region induces chaotic dynamics, character- 
ized by exponential cycles (order 2 Mc with M c ~ O(N)). 
In this case, the initial conditions, and thus the influence 
of the external world (represented by the state of sensor 
nodes in the fraction 1 — 7), do not affect the dynamics. 
For this reason, studying the features of networks away 
from the transition is only relatively interesting. 

The most interesting region to study is the transition 
point, where the dynamics can be at the same time non- 
trivial and under external control. At this point, our 
analytical mean field theory predicts that the structure 
of feedback components in the graph is simple. Feedback 
components (independently from the in-degree) show an 
elementary modular organization in simple disconnected 
loops, found in simulations (Fig. [7]) and predicted by 
the analytical approach. In turn, this feedback struc- 
ture transposes to a modular dynamics, which can be 
completely characterized by the lcm structure of peri- 
ods. We give an analytical (large N) prediction for the 
maximum cycle achievable. Comparing with (small N) 
simulations (Fig. [TTj), the distribution of the maximum 



cycles is wide but, with increasing dimension of the sys- 
tem, the estimate becomes increasingly reliable and com- 
parable with the prediction. This point might have an 
interest for the modeling of empirical biological networks 
(which have N of the order of a few thousand). Indeed, 
at these "realistic" system sizes, this point corresponds 
to an interval with a finite span, because of finite-size 
effects, as it happens with Kauffman networks [391 ] . 

It is interesting to make a comparison between the 
model used here and the behaviour of Kauffman networks 
with no input structure in the underlying graph and 
general Boolean functions. In the latter model, the dy- 
namics is entirely controlled by so-called relevant nodes 
31. 1321 which have been the subject of many studies 
36 , [lo, S3, 52 • In networks between the chaotic and the 



stable phase, they spontaneously organize into discon- 
nected clusters whose number increases logarithmically 
with the system size [4(], [43j] . Most of relevant clusters 
are simple loops. The number of relevant nodes in critical 
networks scales as iV 1 / 3 in the limit N — > I39l. As re- 
cently found by Drossel and coworkers [13, 140L l44j , since 
the relevant part constitutes only a vanishing portion of 
the network, the topology of Kauffman networks is most 
likely very different from the topology of real biological 
networks, such as genetic regulatory networks, where one 
would expect that the majority of nodes is relevant or at 
least not always frozen. 

As anticipated in Section HH there exists an analogy 
between relavant components of Kauffman networks and 
the LRb core of the 7 model. We find that the critical 
core has a modular structure and the amount of discon- 
nected loops increases as the system size grows. 

Conside ring the scaling approach of Mihaljev and 
coworkers 30], it is interesting to observe the close cor- 
respondence between an XOR Kauffman network with an 
imposed fraction of 1 — j3 of constant functions and the 
model studied here. In this case, their approach would 
give a transition between frozen and chaotic phase lo- 
cated at p c = X/K in the parameter space. Note that, 
however (since no XOR functions are constant) the only 
way to vary (3 in our case is to change the ensemble of 
graphs, i.e. vary 7, and allow for a input-output struc- 
ture of the network. The difference between the two ap- 
proaches is acting directly on the topology (more specifi- 
cally, on the fraction of input nodes) instead of modifying 
the graph with the introduction of constant functions. 
The same reasoning suggest that this transition point we 
find is a lower bound for Kauffman networks with a frac- 
tion 1 — 7 of constant functions and a general ensemble 
of functions. 

In spite of this analogy, simulations show that the num- 
ber M c of nodes belonging to the core of critical networks 
scales as with ( ~ 0.4 (see Appendix |B|) that is differ- 
ent from the scaling exponent found for relevant compo- 
nents in critical Kauffman networks (discussed for exam- 
ple in the same study of Mihaljev and coworkers, [3(il]). 
The error on the fit seems to be small enough to exclude 
a discrepancy of 0.1 in the exponent but it is possible 
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that this deviation between the two models derives from 
the fact that we have accessed this quantity by numeri- 
cal work, and thus are limited by system size (we tested 
up to 250.000 nodes). On the other hand, we can also 
speculate that this difference derives from the fact that 
the topology of the underlying graphs in the two models 
is not the same. Indeed, in the case of a general Kauff- 
man network, nodes may become frozen because some 
of their inputs are connected to a frozen node, and the 
resulting function is constant. Thus, frozen nodes do not 
have strictly zero in-degree as in the 7 model. In our 
case, this cannot happen, as the topology is strictly con- 
trolled. In other words, the quantity M c describes the 
size of the feedback region that only for the 7 model with 
XOR dynamics coincides with the relevant dynamic re- 
gion. Studying the simplest possible case of this model 
has the further advantage of elucidating in detail some 
of the phenomenology connected to the properties of the 
critical core, its modular structure and the amount of 
disconnected loops, and thus providing an estimate for 
the size of the largest cycle in the dynamics. 

In conclusion, while simplified, the 7 model gives an 
insight into the role of the input-output structure in the 
whole-network dynamics, which is relevant for gene net- 
works. Direct application to concrete problems concern- 
ing regulatory networks would require generalizing the 
approach to more realistic representations for the input 
functions and graph ensembles (45[. In particular, for 
what concerns the dynamics, our results on the role of 
the emerging extensive feedback core, which apply to 
XOR functions, could be extended to the study of more 
complex dynamics with more realistic choice for the en- 
semble of functions. As soon as a different classes of 
Boolean functions are included in the model, the situa- 
tion becomes more complicated, as the feedback struc- 
tures responsible for the dynamical behaviour of the net- 
work would not anymore simply be the cycles in the net- 
work's topology, but a subset of them. On the other 
hand, it is possible that this problem can be bypassed 
by defining generalized pruning algorithms on the under- 
lying graphs where different "weights" are associated to 
each link or Boolean gate. Thus, the basic phenomenol- 
ogy and tools described here could be exploited in con- 
structing and approaching simplified but realistic models 
of genetic regulation networks. 



APPENDIX A: LEAF REMOVAL EQUATIONS 

1. Leaf Removal up 

To write the mean-field equations for this algorithm, 
we analyze the evolution of connectivities of the regulated 
nodes only, which are specified by the matrix S. The 
number of LR steps is the time t of the process and is the 
number of nodes that have been removed. We introduce 
the normalized time i — t/M , ( € [0, 1] so that At — 



At time t = 0, the probabilities /„ (see Section MI C| 
are 



MO) = 

/ n (0) = e -T*^ = e-*(o» 



n > 



(Al) 

fx indicating the fraction of erased nodes. When 
Mfx(i) = Mi nodes are removed, c n (i) = Mf n (t) 
nodes with n outgoing edges remain. When a node with- 
out outgoing edges is found, it is removed (it corresponds 
to a column of the matrix with no elements different from 
0) so that at each step M fx increases of one unit. Also 
its incoming links are removed, which means that the cor- 
responding row in S is cleared out, replacing with zero 
all the entries. Removing edges changes the probability 
distribution f n {i)- Indeed the probability that an edge 
that is being removed comes from a node with n outgoing 
edges (this one becoming anu-1 outgoing edges node) 
is 



nc n (t) 



nf n {t) 



so that removing an edge implies Ac„ = — 1 ■ P[ n ^n-i] + 

1 ' P[n+1— m] ■ 

Recognizing that we remove, on average, 7/0 edges per 
step [461 ] , we write the flow equations for the probabilities 
f n (i ) as follows: 



&/o(f) = "l- 



7 k 



M 



li Mt) = ^y[(»+l)/n+i(f) -nf n (t)} ,n> 

_(A2) 

with initial conditions (|A1|) and where (n) (t ) = 

E n >i n /"(*) = T, n >o n fn(t) = M 1 - *) bein g th e av- 
erage number of edges per node after Mt removals. One 
can easily check that the poissonian f n (t) given in Sec- 
tion IIII CI are solutions of Eqns. (|A2|) once having im- 
posed 



d\(t) 
dt 



jk 



1 



A(t) (n)(t) 



1-f 



from which \(i ) = jk(l - t). See Ref. [23| for details. 
The algorithm stops when /o(i) = 0, i.e. at time i* = 
e ~A(t )_ This equation implies that if 7/c < 1 the stop 
normalized time is t* — 1 or, in other words, all the graph 
is removed. There is a critical value of 7, 7 C = k^ 1 , below 
which the whole graph is cancelled after the application of 
LRu. These networks typically have a tree- like structure 
without feedback regions. 

Being t the number of variables deleted, it is useful to 
introduce the variable 



1 - t 



t 

l -M 



M-t 
M 



M- 1 and Ac„ = MAf n = 



which expresses the fraction of remained nodes during 
the LRu process . 
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2. Leaf Removal down 

In the first LRd step, all the N — M unregulated nodes 
are eliminated from the network because they have no 
inputs (this corresponds to neglecting the matrix R and 
working with only the square matrix S). 

As above, the time t of the algorithm represents the 
number of removed nodes. It is again useful to employ a 
normalized time t. The number of nodes with n entries 
at time t is r n (t) = Mp n (t) where p n {t) is the probability 
introduced in Section fill CI At time t = it reads: 



'px(O) = 

Pn(0) = 



7 n (l-7) 



k-r 



< n < k 
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FIG. 9: Distributions of the number of nodes belonging to the 
LRb core for different system dimension at fixed 7. General 
case (a) with 7 = 0.45 and critical networks (b). Simula- 
tions are produced by 10 3 iterations for N = 5 ■ 10 4 and 10 4 
iterations otherwise. 



where px is the fraction of removed nodes. In the dec- 
imation algorithm a node with no entry is cleared out 
together with its outgoing connections, so that probabil- 
ity p n (t) changes. At each step, the number of emptied 
rows rx increases by one and we obtain 



px(t)=t 



5>#) = l-t 



i>0 



The probability that a removed edge was input for a node 
with n ingoing edges is: 



(n — >n— 1) 



r„(i) 



np n (t 



T,n>l nr n{ t ) M 1 -*) 



where we remembered that (n)(i) — k"f(l — t). 
Once again, an average number kj of edges are removed 
at each step so that a node with n entries becomes, with 
probability kj-P( n _> n _i\, a node with n— 1 entries. Thus, 
one can write the flow equations: 

{f z px{z) = -l 



|p„(z) = n£^-(n + l)£^M 
.Pfe+i = , 



n ^ k, 



* 1 ^dowri G [0,1]- 



where we have already made the substitution z = 

In Section IIII CI we give 
The process ends when 
available, that is when 
= (1 — 7z) fc . Studying 
< fc _1 then there are no 
= 0, which indicates the 
which all is removed. 



the solutions to these equations, 
no more unregulated nodes are 
Po(z) = 0, namely when 1 — z 
this condition one sees that if 7 
solutions apart from the value z 
existence of a tree-like graph in 



central value predicted by LR equations. Furthermore, 
it can be observed that M c scales with N as a power 
law with exponent lower than 1 (simulations suggest a 
trend M c ~ N** with £ ~ 0.4), while nodes upstream the 
core that are removed by LRd and nodes downstream 
removed by LRu grow approximately linearly in N . Fig- 
ures [TU] show the results of simulations. The upstream 
part of the graph removed by LRd is indicated with the 
letter U and the downstream part with D. One specu- 
lates that U, D and M c at the transition point behave 
as 



U ~aN D ~ PN 



M r 



Also the particular core structure of critical networks is 
affected by finite size effects. In fact, for N = 100 it is not 
unusual to find core networks with more complex struc- 
tures than disconnected simple loops. This also explains 
the behaviour of Fig. 6(d) one can speculate from the 



plots that the LRb core description in modular structures 
becomes more accurate increasing N, while, for smaller 
systems and 7 C ^ 7 ^ 7 e , a single disconnected compo- 
nent with a more complex structure is present with more 
frequency. 
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APPENDIX B: FINITE SIZE EFFECTS 
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According to our simulations, finite size effects appear 
to be very relevant both in topological structure and dy- 
namic behaviour. Increasing A, the shape of the distribu- 
tion remains the same and it is more peaked around the 



FIG. 10: Scaling with N of the core M c . Each point of simu- 
lation is the averaged result of 10 3 iterations. The fit y — ax h 
with a ~ 0.68 and b ~ —0.58 is presented. 
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APPENDIX C: ESTIMATES OF T max 

We estimate T max considering cores built of loop 
structures [47j of lengths the first m prime numbers 

{P\,P2, ■ ■ ■ ,Pm}, SO: 

M c = ^ Pi i T = T max = Yl Pi ■ 

i iC rn i rn 

Since the nth prime number scales as nlnn, from the 
above equations one has 

M c ~ nlnn ~ — m 2 lnra 

n 
m 

In Tmax ~ In n + In In n ™ m In in , 

n 

from which 

lnT max ~ y/M c ]nM c . (CI) 

Simulated graphs show that this expression is a good up- 
per bound for the value of T = LCM{/i} (Fig. QT]a). The 



plots indicate that these values "thicken" around multi- 
ple values of M c . This suggests that residual graphs have 
a large component with dimension roughly equal to M c 
and a short loop of length 2, 3,. . . 

It is interesting to compare these estimates with 
maximum-length cycles generated by simulations of the 
core dynamics. Our numerical plots show the values 
of the maximum period T max emerging from reduced 
dynamics. In the case N = 99 the bound set by 
Eq. (jCljl is passed several times (Fig. [TT]b) because 
of finite size effects, but it becomes more reliable for 
larger systems. Figure [TTJc shows that simulated val- 
ues for T max seem to concentrate under the function 
g(M c ) ~ exp(VM c In M c ). Finally, Fig. QUd shows the 
maximum cycle distributions for the same simulations. 
These are power-law like, and become broader with the 
system dimension. Since statistics is limited by the im- 
posed cutoff, averages are highly biased by this compu- 
tational restriction and cannot be considered reliable for 
large systems or much beyond the critical line. 
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